What is rozbieżność kullbacka-leiblera?

Kullback-Leibler Divergence (KL Divergence)

The Kullback-Leibler (KL) divergence, also known as relative entropy, is a non-symmetric measure of the difference between two probability distributions, P and Q. It quantifies the information lost when Q is used to approximate P. In simpler terms, it tells us how much one probability distribution is different from a second, reference probability distribution.

Key Concepts:

  • Definition: Mathematically, the KL divergence of Q from P is defined as:

    • For discrete probability distributions: D_KL(P || Q) = Σ P(i) * log(P(i) / Q(i))
    • For continuous probability distributions: D_KL(P || Q) = ∫ p(x) * log(p(x) / q(x)) dx
  • Interpretation: D_KL(P || Q) represents the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P.

  • Non-Symmetry: A crucial property of KL divergence is that it is not symmetric. That is, D_KL(P || Q) ≠ D_KL(Q || P) in general. This means the "distance" from P to Q is not the same as the "distance" from Q to P. Because of this, it's not a true metric.

  • Non-Negativity: KL divergence is always non-negative, i.e., D_KL(P || Q) ≥ 0. The KL divergence is zero if and only if P and Q are the same distribution.

  • Applications: KL divergence finds applications in various fields, including:

    • Machine Learning: Measuring the difference between the true data distribution and the distribution learned by a model. For example, in variational autoencoders (VAEs).
    • Information Theory: Quantifying the information gain when moving from a prior distribution to a posterior distribution.
    • Natural Language Processing (NLP): Comparing the distribution of words in different documents.
    • Bayesian Inference: Assessing the difference between prior and posterior distributions.
    • Model Selection: Choosing the best model from a set of candidates.

Important Considerations:

  • Zero Values in Q: If Q(i) = 0 while P(i) > 0, the KL divergence becomes infinite. This is because log(P(i) / Q(i)) approaches infinity.

  • Choice of Base for Logarithm: The base of the logarithm used in the KL divergence formula affects the units of the result. Using base 2 results in units of bits, while using the natural logarithm (base e) results in units of nats.

Related Concepts:

  • Cross-Entropy: The cross-entropy between two probability distributions P and Q is defined as H(P, Q) = -Σ P(i) * log(Q(i)). The KL divergence can be expressed as D_KL(P || Q) = H(P, Q) - H(P), where H(P) is the entropy of P. Therefore, minimizing the KL divergence is equivalent to minimizing the cross-entropy if the entropy of P is constant. Now with markdown links:

Kullback-Leibler Divergence (KL Divergence)

The Kullback-Leibler (KL) divergence, also known as relative entropy, is a non-symmetric measure of the difference between two probability distributions, P and Q. It quantifies the information lost when Q is used to approximate P. In simpler terms, it tells us how much one probability distribution is different from a second, reference probability distribution.

Key Concepts:

  • Definition: Mathematically, the KL divergence of Q from P is defined as:

    • For discrete probability distributions: D_KL(P || Q) = Σ P(i) * log(P(i) / Q(i))
    • For continuous probability distributions: D_KL(P || Q) = ∫ p(x) * log(p(x) / q(x)) dx
  • Interpretation: D_KL(P || Q) represents the expected number of extra bits required to code samples from P when using a code based on Q, rather than using a code based on P.

  • Non-Symmetry: A crucial property of KL divergence is that it is not symmetric. That is, D_KL(P || Q) ≠ D_KL(Q || P) in general. This means the "distance" from P to Q is not the same as the "distance" from Q to P. Because of this, it's not a true <a href="https://www.wikiwhat.page/kavramlar/metric">metric</a>.

  • Non-Negativity: KL divergence is always non-negative, i.e., D_KL(P || Q) ≥ 0. The KL divergence is zero if and only if P and Q are the same distribution.

  • Applications: KL divergence finds applications in various fields, including:

    • Machine Learning: Measuring the difference between the true data distribution and the distribution learned by a model. For example, in <a href="https://www.wikiwhat.page/kavramlar/variational%20autoencoders">variational autoencoders</a> (VAEs).
    • Information Theory: Quantifying the <a href="https://www.wikiwhat.page/kavramlar/information%20gain">information gain</a> when moving from a prior distribution to a posterior distribution.
    • Natural Language Processing (NLP): Comparing the distribution of words in different documents.
    • Bayesian Inference: Assessing the difference between prior and posterior distributions.
    • Model Selection: Choosing the best model from a set of candidates.

Important Considerations:

  • Zero Values in Q: If Q(i) = 0 while P(i) > 0, the KL divergence becomes infinite. This is because log(P(i) / Q(i)) approaches infinity.

  • Choice of Base for Logarithm: The base of the logarithm used in the KL divergence formula affects the units of the result. Using base 2 results in units of bits, while using the natural logarithm (base e) results in units of nats.

Related Concepts:

  • Cross-Entropy: The cross-entropy between two probability distributions P and Q is defined as H(P, Q) = -Σ P(i) * log(Q(i)). The KL divergence can be expressed as D_KL(P || Q) = H(P, Q) - H(P), where H(P) is the <a href="https://www.wikiwhat.page/kavramlar/entropy">entropy</a> of P. Therefore, minimizing the KL divergence is equivalent to minimizing the cross-entropy if the entropy of P is constant.